Constructing and exploiting an automatically annotated resource of legislative texts
نویسندگان
چکیده
In this paper, we report on the construction of a resource of Swiss legislative texts that is automatically annotated with structural, morphosyntactic and content-related information, and we discuss the exploitation of this resource for the purposes of legislative drafting, legal linguistics and translation and for the evaluation of legislation. Our resource is based on the classified compilation of Swiss federal legislation. All texts contained in the classified compilation exist in German, French and Italian, some of them are also available in Romansh and English. Our resource is currently being exploited (a) as a testing environment for developing methods of automated style checking for legislative drafts, (b) as the basis of a statistical multilingual word concordance, and (c) for the empirical evaluation of legislation. The paper describes the domainand language-specific procedures that we have implemented to provide the automatic annotations needed for these applications.
منابع مشابه
Exploiting Properties of Legislative Texts to Improve Classification Accuracy
Organizing legislative texts into a hierarchy of legal topics enhances the access to legislation. Manually placing every part of new legislative texts in the correct place of the hierarchy, however, is expensive and slow, and therefore naturally calls for automation. In this paper, we assess the ability of machine learning methods to develop a model that automatically classifies legislative tex...
متن کاملSemantic Web Standards and Ontologies for Legislative Drafting Support
Machine readable open public data and the issue of multilingual web are open challenges promising to transform the relationship between citizens and European institutions. In this context the DALOS project aims at ensuring coherence and alignment in the legislative language, providing law-makers with knowledge management tools to improve the control over the multilingual complexity of European ...
متن کاملExploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus
In this article we illustrate and evaluate an approach to create high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The trans...
متن کاملA Corpus-based Approach to Linguistic Function
In this paper, we present our recent experience in constructing a first-of-its-kind functional corpus based on the theoretical framework of Systemic Functional Linguistics. Annotated on selected texts from the Penn Treebank, the corpus was built by a collaborative team on web-based annotation platform with several advanced features. After a discussion on the background and motivation of the pro...
متن کاملA Multiform Balanced Dependency Treebank for Romanian
The UAIC-RoDia-DepTb is a balanced treebank, containing texts in non-standard language: 2,575 chats sentences, old Romanian texts (a Gospel printed in 1648, a codex of laws printed in 1818, a novel written in 1910), regional popular poetry, legal texts, Romanian and foreign fiction, quotations. The proportions are comparable; each of these types of texts is represented by subsets of at least 1,...
متن کامل